Random Forest

Decision Tree

Classification Tree
Regression Tree

Decision Tree Learing Algorithms

ID3 (Iterative Dichotomiser 3)
C4.5 (Successor of ID3)
CART (Classification and regression tree)
CHAID (CHi-squared automatic interaction detector)
MARS
Conditional Inference Trees (Statistics-based approach)

Iterative Dichotomiser 3 (ID3)

In decision tree learning, ID3 is an algorithm invented by Ross Quinlan used to generate a decision tree from a dataset. ID3 is the precursor to the C4.5 algorithm, and is typically used in the machine learning and natural language processing domains.

Algorithms

Pseudocode



In [ ]:

    
ID3 (Examples, Target_Attribute, Attributes)
    Create a root node for the tree
    If all examples are positive, Return the single-node tree Root, with label = +.
    If all examples are negative, Return the single-node tree Root, with label = -.
    If number of predicting attributes is empty, then Return the single node tree Root,
    with label = most common value of the target attribute in the examples.
    Otherwise Begin
        A ← The Attribute that best classifies examples.
        Decision Tree attribute for Root = A.
        For each possible value, vi, of A,
            Add a new tree branch below Root, corresponding to the test A = vi.
            Let Examples(vi) be the subset of examples that have the value vi for A
            If Examples(vi) is empty
                Then below this new branch add a leaf node with label = most common target value in the examples
            Else below this new branch add the subtree ID3 (Examples(vi), Target_Attribute, Attributes – {A})
    End
    Return Root

Metrics

Entropy

Entropy $H(S)$ is a measure of the amount of uncertainty in the data set $S$ (i.e. entropy characterizes the data set $S$). $$H(S) = - \sum_{x\in X} p(x) log_2 p(x)$$

where,

$S$ - The current data set for which entropy is being calculated (changes every iteration of the ID3 algorithm)
$X$ - Set of classes in $S$
$p(x)$ - The proportion of the number of elements in class $x$ to the number of elements in set $S$ When $H(S)=0$, the set $S$ is perfectly classified (i.e. all elements in $S$ are of the same class).

In ID3, entropy is calculated for each remaining attribute. The attribute with the smallest entropy is used to split the set $S$ on this iteration. The higher the entropy, the higher the potential to improve the classification here.

Information gain

Information gain $IG(A)$ is the measure of the difference in entropy from before to after the set $S$ is split on an attribute $A$. In other words, how much uncertainty in $S$ was reduced after splitting set $S$ on attribute $A$. $$IG(A,S) = H(S) - \sum_{t\in T} p(t)H(t)$$

where,

$H(S)$ - Entropy of set $S$
$T$ - The subsets created from splitting set $S$ by attribute $A$ such that $S = \bigcup_{t\in T} t$
$p(t)$ - The proportion of the number of elements in $t$ to the number of elements in set $S$
$H(t)$ - Entropy of subset $t$

In ID3, information gain can be calculated(instead of entropy) for each remaining attribute. The attribute with the largest information gain is used to split the set $S$ on this iteration.

Random Forest

Decision Tree

Decision Tree Learing Algorithms

Iterative Dichotomiser 3 (ID3)

Algorithms

Pseudocode

Metrics

Entropy

Information gain

Metrics

Limitations

Random Forest